Will the Identification of Reduplicated Multiword Expression (RMWE) Improve the Performance of SVM Based Manipuri POS Tagging?
نویسندگان
چکیده
Reduplicated Multiword Expressions (RMWEs) are abundant in Manipuri, the highly agglutinative India language. The Part of Speech (POS) tagging of Manipuri using Support Vector Machine (SVM) has been developed and evaluated. The POS tagger has been updated with identified RMWEs as another feature. The performance of the SVM based POS tagger before and after adding RMWE as a feature have been compared. The SVM based POS tagger has been evaluated with the F-Score of 77.67% which has increased to 79.61% with RMWE as an additional feature. Thus the performance the POS tagger has improved after adding RMWE as an additional feature.
منابع مشابه
Manipuri Chunking: An Incremental Model with POS and RMWE
This paper records the work of Manipuri Chunking by using the commonly use tool of Support Vector Machine (SVM). Manipur being a very highly agglutinative language have to be careful in selecting the features for running the SVM. An experiment is being performed with 35,000 words to check whether the POS tagged and the Reduplicated Multiword Expression (RMWE) can improve the Chunk identificatio...
متن کاملReduplicated MWE (RMWE) helps in improving the CRF based Manipuri POS Tagger
This paper gives a detail overview about the modified features selection in CRF (Conditional Random Field) based Manipuri POS (Part of Speech) tagging. Selection of features is so important in CRF that the better are the features then the better are the outputs. This work is an attempt or an experiment to make the previous work more efficient. Multiple new features are tried to run the CRF and ...
متن کاملWeb Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM
A web based Manipuri corpus is developed for identification of reduplicated multiword expression (MWE) and multiword named entity recognition (NER). Manipuri is one of the rarely investigated language and its resources for natural language processing are not available in the required measure. The web content of Manipuri is also very poor. News corpus from a popular Manipuri news website is coll...
متن کاملIntegration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System
The language specific Multiword expressions (MWEs) play important roles in many natural language processing (NLP) tasks. Integrating reduplicated multiword expressions (RMWEs) into the Phrase Based Statistical Machine Translation (PBSMT) to improve translation quality is reported in the present work between Manipuri, a highly agglutinative Tibeto-Burman language and English. In addition, Multiw...
متن کاملIdentification of Reduplicated Multiword Expressions Using CRF
This paper deals with the identification of Reduplicated Multiword Expressions (RMWEs) which is important for any natural language applications like Machine Translation, Information Retrieval etc. In the present task, reduplicated MWEs have been identified in Manipuri language texts using CRF tool. Manipuri is highly agglutinative in nature and reduplication is quite high in this language. The ...
متن کامل